Search CORE

45 research outputs found

Relational Knowledge Extraction from Attribute-Value Learners

Author: Garcez Artur S. D.
Zaverucha Gerson
Publication venue: OASIcs - OpenAccess Series in Informatics. 2013 Imperial College Computing Student Workshop
Publication date: 01/01/2013
Field of study

Bottom Clause Propositionalization (BCP) is a recent propositionalization method which allows fast relational learning. Propositional learners can use BCP to obtain accuracy results comparable with Inductive Logic Programming (ILP) learners. However, differently from ILP learners, what has been learned cannot normally be represented in first-order logic. In this paper, we propose an approach and introduce a novel algorithm for extraction of first-order rules from propositional rule learners, when dealing with data propositionalized with BCP. A theorem then shows that the extracted first-order rules are consistent with their propositional version. The algorithm was evaluated using the rule learner RIPPER, although it can be applied on any propositional rule learner. Initial results show that the accuracies of both RIPPER and the extracted first-order rules can be comparable to those obtained by Aleph (a traditional ILP system), but our approach is considerably faster (obtaining speed-ups of over an order of magnitude), generating a compact rule set with at least the same representation power as standard ILP learners

Dagstuhl Research Online Publication Server

ClusterMiner: High Performance for Data, Text and Web Mining

Author: Baião Fernanda
Costa Myriam
Ebecken Nelson
Evsukoff Alexandre
Mattoso Marta
Terra Guilherme
Zaverucha Gerson
Publication venue: 'Universidade Federal do Estado do Rio de Janeiro UNIRIO'
Publication date: 24/11/2008
Field of study

Universidade Federal do Estado do Rio de Janeiro: Portal de Revistas da UNIRIO

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

Author: A Andreeva
A Ben-Hur
A Karwath
A Karwath
A Shah
Alessandra Carbone
B Liu
B Qian
B Webb-Robertson
C Ferreira
C Leslie
D Higgins
F Wilcoxon
G Yona
Gerson Zaverucha
H Rangwala
H Saigo
J Bernardes
J Davis
J Gough
J Quinlan
J Soeding
J Weston
Juliana S Bernardes
L De Raedt
L Dehaspe
L Liao
N Shan-Hwei
Q Dong
Q Su
R Agrawal
R Hughey
R King
R King
R Kuang
R Sadreyev
S Altschul
S Altschul
S Brenner
S Eddy
S Eddy
S Kawashima
S Lee
T Handstad
T Jaakkola
T Lingner
U Syed
V Alexandrov
V Atalay
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fast relational learning using bottom clause propositionalization with artificial neural networks

Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy

CiteSeerX

City Research Online

Crossref

A nonmonotonic multi-agent logic of belief : a Modal Defeasible Relevant approach

Author: Zaverucha Gerson
Zaverucha Gerson
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/1990
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Theory Refinement of Bayesian Logic Programs

Author: Gerson Zaverucha
Kate Revoredo
Publication venue
Publication date
Field of study

Bayesian Logic Programs (BLP) [8][9] is a powerful and elegant framework for combining the expressiveness of first order logic with Bayesian networks. They can represent both Bayesian networks and logic programs, and their kernel in Prolog is an adaptation of an usual Prolog metainterpreter. It has been successfully compared to other such proposals in the literature. In this paper, we present a theory refinement system called RBLP, which minimally modifies a given BLP to make it consistent with the available training data. RBLP unifies FORTE [20], which induces a revision of Logic Programs from data, with an adaptation of EM [10] for learning the entries of the CPTs.

CiteSeerX